Estimator Variance in Reinforcement Learning: Theoretical Problems and Practical Solutions
نویسنده
چکیده
In reinforcement learning as in many on line search techniques a large number of estimation parameters e g Q value estimates for step Q learning are maintained and dynamically updated as in formation comes to hand during the learning process Excessive variance of these estimators can be problematic resulting in uneven or unstable learning or even making e ective learning impossible Estimator variance is usually managed only indirectly by selecting global learning algorithm parameters e g for TD based meth ods that are a compromise between an acceptable level of estimator perturbation and other desirable system attributes such as reduced estimator bias In this paper we argue that this approach may not always be adequate particularly for noisy and non Markovian domains and present a direct approach to managing estimator vari ance the new ccBeta algorithm Empirical results in an autonomous robotics domain are also presented showing improved performance using the ccBeta method
منابع مشابه
Reinforcement Learning in Situated Agents: Some Theoretical Problems and Practical Solutions
In on-line reinforcement learning, often a large number of estimation parameters (e.g. Q-value estimates for 1-step Q-learning) are maintained and dynamically updated as information comes to hand during the learning process. Excessive variance of these estimators can be problematic, resulting in uneven or unstable learning, or even making eeective learning impossible. Estimator variance is usua...
متن کاملDoubly Robust Off-policy Evaluation for Reinforcement Learning
We study the problem of evaluating a policy that is different from the one that generates data. Such a problem, known as off-policy evaluation in reinforcement learning (RL), is encountered whenever one wants to estimate the value of a new solution, based on historical data, before actually deploying it in the real system, which is a critical step of applying RL in most real-world applications....
متن کاملDoubly Robust Off-policy Value Evaluation for Reinforcement Learning
We study the problem of off-policy value evaluation in reinforcement learning (RL), where one aims to estimate the value of a new policy based on data collected by a different policy. This problem is often a critical step when applying RL to real-world problems. Despite its importance, existing general methods either have uncontrolled bias or suffer high variance. In this work, we extend the do...
متن کاملDeep Reinforcement Learning
In reinforcement learning (RL), stochastic environments can make learning a policy difficult due to high degrees of variance. As such, variance reduction methods have been investigated in other works, such as advantage estimation and controlvariates estimation. Here, we propose to learn a separate reward estimator to train the value function, to help reduce variance caused by a noisy reward sig...
متن کاملMultiagent Reinforcement Learning for Multi-Robot Systems: A Survey
Multiagent reinforcement learning for multirobot systems is a challenging issue in both robotics and artificial intelligence. With the ever increasing interests in theoretical researches and practical applications, currently there have been a lot of efforts towards providing some solutions to this challenge. However, there are still many difficulties in scaling up the multiagent reinforcement l...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997